Fix global scheduling propagation to all dataplane components#318
Fix global scheduling propagation to all dataplane components#318
Conversation
Current Aviator status
This pull request is currently open (not queued). How to mergeTo merge this PR, comment
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
ab738c6 to
d67fec2
Compare
|
@xjerod thanks for working on this. I just tested it and it only injected nodeSelectors to the |
| nodeSelector: | ||
| {{- toYaml . | nindent 8 }} | ||
| {{- if .Values.imageBuilder.buildkit.nodeSelector }} | ||
| {{- include "imagebuilder.buildkit.scheduling.nodeSelector" . | nindent 6 }} |
There was a problem hiding this comment.
I'm using the customer-facing resources to find the gaps.
I set this in values
scheduling:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: union.ai/node-role
operator: In
values:
- services
But I see this is not picked up by this helper and confirms what I found by testing
|
just tested using these values and the only component that is left out without scheduling config is |
|
The above is only with the default subcharts enabled. Soon we'll have more of them enabled by default (like |
Summary
schedulingblock fornodeSelector,tolerations, andaffinity. Users settingscheduling.tolerationsandscheduling.nodeSelectorto target a dedicated node pool would find these pods stuck in Pending.tolerations: [],nodeSelector: {}) to subchart values (opencost, kube-state-metrics, metrics-server, monitoring stack) with documentation. Helm has no mechanism to auto-propagate parent values into subchart templates, so these must be set alongside the globalschedulingblock.Changes
_helpers.tpl— New scheduling helpers for prometheus, flyteconnector, imagebuilder.buildkitprometheus/deployment.yaml— Replaced inline scheduling with helper includeflyteconnector/deployment.yaml— Sameimagebuilder/deployment.yaml— Same (preserves existing hardcoded podAntiAffinity)values.yaml— Added subchart scheduling fields with documentation.gitignore— Ignore subchart artifacts extracted byhelm dep updateTest plan
make helm-testpasses (includes two new snapshot tests)dataplane.global-scheduling— Verifies all 9 Union-owned deployments inherit globalschedulingvaluesdataplane.scheduling-override— Verifies per-service values (prometheus, flyteconnector) take precedence over globalNOTE: 12.5k of the 13k lines added are from two new generated test outputs due to the new test I added
jan/wip-selfhosted-